Goto

Collaborating Authors

 Shanxi Province





AI Agent for Source Finding by SoFiA-2 for SKA-SDC2

Zhou, Xingchen, Li, Nan, Jia, Peng, Liu, Yingfeng, Deng, Furen, Shu, Shuanghao, Li, Ying, Cao, Liang, Shan, Huanyuan, Ibitoye, Ayodeji

arXiv.org Artificial Intelligence

Source extraction is crucial in analyzing data from next-generation, large-scale sky surveys in radio bands, such as the Square Kilometre Array (SKA). Several source extraction programs, including SoFiA and Aegean, have been developed to address this challenge. However, finding optimal parameter configurations when applying these programs to real observations is non-trivial. For example, the outcomes of SoFiA intensely depend on several key parameters across its preconditioning, source-finding, and reliability-filtering modules. To address this issue, we propose a framework to automatically optimize these parameters using an AI agent based on a state-of-the-art reinforcement learning (RL) algorithm, i.e., Soft Actor-Critic (SAC). The SKA Science Data Challenge 2 (SDC2) dataset is utilized to assess the feasibility and reliability of this framework. The AI agent interacts with the environment by adjusting parameters based on the feedback from the SDC2 score defined by the SDC2 Team, progressively learning to select parameter sets that yield improved performance. After sufficient training, the AI agent can automatically identify an optimal parameter configuration that outperform the benchmark set by Team SoFiA within only 100 evaluation steps and with reduced time consumption. Our approach could address similar problems requiring complex parameter tuning, beyond radio band surveys and source extraction. Yet, high-quality training sets containing representative observations and catalogs of ground truth are essential.


RI-Loss: A Learnable Residual-Informed Loss for Time Series Forecasting

Wang, Jieting, Shang, Xiaolei, Li, Feijiang, Peng, Furong

arXiv.org Artificial Intelligence

Time series forecasting relies on predicting future values from historical data, yet most state-of-the-art approaches-including transformer and multilayer perceptron-based models-optimize using Mean Squared Error (MSE), which has two fundamental weaknesses: its point-wise error computation fails to capture temporal relationships, and it does not account for inherent noise in the data. To overcome these limitations, we introduce the Residual-Informed Loss (RI-Loss), a novel objective function based on the Hilbert-Schmidt Independence Criterion (HSIC). RI-Loss explicitly models noise structure by enforcing dependence between the residual sequence and a random time series, enabling more robust, noise-aware representations. Theoretically, we derive the first non-asymptotic HSIC bound with explicit double-sample complexity terms, achieving optimal convergence rates through Bernstein-type concentration inequalities and Rademacher complexity analysis. This provides rigorous guarantees for RI-Loss optimization while precisely quantifying kernel space interactions. Empirically, experiments across eight real-world benchmarks and five leading forecasting models demonstrate improvements in predictive performance, validating the effectiveness of our approach. The code is publicly available at: https://github.com/shang-xl/RI-Loss.


Optimized scheduling of electricity-heat cooperative system considering wind energy consumption and peak shaving and valley filling

Ye, Jin, Wang, Lingmei, Zhang, Shujian, Wu, Haihang

arXiv.org Artificial Intelligence

With the global energy transition and rapid development of renewable energy, the scheduling optimization challenge for combined power-heat systems under new energy integration and multiple uncertainties has become increasingly prominent. Addressing this challenge, this study proposes an intelligent scheduling method based on the improved Dual-Delay Deep Deterministic Policy Gradient (PVTD3) algorithm. System optimization is achieved by introducing a penalty term for grid power purchase variations. Simulation results demonstrate that under three typical scenarios (10%, 20%, and 30% renewable penetration), the PVTD3 algorithm reduces the system's comprehensive cost by 6.93%, 12.68%, and 13.59% respectively compared to the traditional TD3 algorithm. Concurrently, it reduces the average fluctuation amplitude of grid power purchases by 12.8%. Regarding energy storage management, the PVTD3 algorithm reduces the end-time state values of low-temperature thermal storage tanks by 7.67-17.67 units while maintaining high-temperature tanks within the 3.59-4.25 safety operating range. Multi-scenario comparative validation demonstrates that the proposed algorithm not only excels in economic efficiency and grid stability but also exhibits superior sustainable scheduling capabilities in energy storage device management.


Pragmatic Theories Enhance Understanding of Implied Meanings in LLMs

Sato, Takuma, Kawano, Seiya, Yoshino, Koichiro

arXiv.org Artificial Intelligence

The ability to accurately interpret implied meanings plays a crucial role in human communication and language use, and language models are also expected to possess this capability. This study demonstrates that providing language models with pragmatic theories as prompts is an effective in-context learning approach for tasks to understand implied meanings. Specifically, we propose an approach in which an overview of pragmatic theories, such as Gricean pragmatics and Relevance Theory, is presented as a prompt to the language model, guiding it through a step-by-step reasoning process to derive a final interpretation. Experimental results showed that, compared to the baseline, which prompts intermediate reasoning without presenting pragmatic theories (0-shot Chain-of-Thought), our methods enabled language models to achieve up to 9.6\% higher scores on pragmatic reasoning tasks. Furthermore, we show that even without explaining the details of pragmatic theories, merely mentioning their names in the prompt leads to a certain performance improvement (around 1-3%) in larger models compared to the baseline.


Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement

Yang, Zhe, Li, Wenrui, Chen, Hongtao, Wang, Penghong, Xiong, Ruiqin, Fan, Xiaopeng

arXiv.org Artificial Intelligence

Abstract--Multimodal learning aims to improve performance by leveraging data from multiple sources. During joint multi-modal training, due to modality bias, the advantaged modality often dominates backpropagation, leading to imbalanced optimization. Existing methods still face two problems: First, the long-term dominance of the dominant modality weakens representation-output coupling in the late stages of training, resulting in the accumulation of redundant information. Second, previous methods often directly and uniformly adjust the gradients of the advantaged modality, ignoring the semantics and directionality between modalities. T o address these limitations, we propose Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement (RedReg), which is inspired by information bottleneck principle. Specifically, we construct a redundancy phase monitor that uses a joint criterion of effective gain growth rate and redundancy to trigger intervention only when redundancy is high. Furthermore, we design a co-information gating mechanism to estimate the contribution of the current dominant modality based on cross-modal semantics. When the task primarily relies on a single modality, the suppression term is automatically disabled to preserve modality-specific information. Finally, we project the gradient of the dominant modality onto the orthogonal complement of the joint multi-modal gradient subspace and suppress the gradient according to redundancy. Experiments show that our method demonstrates superiority among current major methods in most scenarios. Ablation experiments verify the effectiveness of our method. The code is available at https://github.com/xia-zhe/RedReg.git Index T erms--Multimodal learning, modality imbalance, information bottleneck This work was supported in part by the National Key R&D Program of China (2023YFA1008501) and the National Natural Science Foundation of China (NSFC) under grant 624B2049 and U22B2035. Wenrui Li, Penghong Wang, and Xiaopeng Fan are with the Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, and also with Harbin Institute of Technology Suzhou Research Institute, Suzhou 215104, China. Hongtao Chen is with the School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China (e-mail: ht166chen@163.com).


Uncovering and Mitigating Transient Blindness in Multimodal Model Editing

Han, Xiaoqi, Li, Ru, Yi, Ran, Tan, Hongye, Liang, Zhuomin, Gutiérrez-Basulto, Víctor, Pan, Jeff Z.

arXiv.org Artificial Intelligence

Multimodal Model Editing (MMED) aims to correct erroneous knowledge in multimodal models. Existing evaluation methods, adapted from textual model editing, overstate success by relying on low-similarity or random inputs, obscure overfitting. We propose a comprehensive locality evaluation framework, covering three key dimensions: random-image locality, no-image locality, and consistent-image locality, op-erationalized through seven distinct data types, enabling a detailed and structured analysis of multimodal edits. We introduce De-VQA, a dynamic evaluation for visual question answering, uncovering a phenomenon we term transient blindness, overfitting to edit-similar text while ignoring visuals. Token analysis shows edits disproportionately affect textual tokens. We propose locality-aware adversarial losses to balance cross-modal representations. Empirical results demonstrate that our approach consistently outperforms existing baselines, reducing transient blindness and improving locality by 17% on average.


Decomposing and Revising What Language Models Generate

Yan, Zhichao, Chen, Jiaoyan, Wang, Jiapu, Li, Xiaoli, Li, Ru, Pan, Jeff Z.

arXiv.org Artificial Intelligence

Attribution is crucial in question answering (QA) with Large Language Models (LLMs).SOTA question decomposition-based approaches use long form answers to generate questions for retrieving related documents. However, the generated questions are often irrelevant and incomplete, resulting in a loss of facts in retrieval.These approaches also fail to aggregate evidence snippets from different documents and paragraphs. To tackle these problems, we propose a new fact decomposition-based framework called FIDES (\textit{faithful context enhanced fact decomposition and evidence aggregation}) for attributed QA. FIDES uses a contextually enhanced two-stage faithful decomposition method to decompose long form answers into sub-facts, which are then used by a retriever to retrieve related evidence snippets. If the retrieved evidence snippets conflict with the related sub-facts, such sub-facts will be revised accordingly. Finally, the evidence snippets are aggregated according to the original sentences.Extensive evaluation has been conducted with six datasets, with an additionally proposed new metric called $Attr_{auto-P}$ for evaluating the evidence precision. FIDES outperforms the SOTA methods by over 14\% in average with GPT-3.5-turbo, Gemini and Llama 70B series.